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Abstract 

We show that the principle of entropy increase may be exactly founded on a few axioms 
valid not only for quantum and classical statistics, but also for a wide range of statistical 
processes. 
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1 Introduction 

The second law of thermodynamics, or the principle of entropy increase, is an exact law of 
nature. To explore its foundation is one of the most important topics in physics for more than 
a century. Prom the daily life we know, matter always approaches its equilibrium state. Text 
books[T]-[l| tell us that the equilibrium state is a state with maximum entropy, and the state of 
a macroscopic system with larger entropy is more probable. It almost explains the above daily 
life experience. However, whether the system always goes from a less probable state to a more 
probable state, or why the principle of entropy increase works, is still an open question. The 
H-theorem of Boltzmann is a classical proof for definite approaching to equilibrium. It is based 
on a model of colliding classical particle system for the macroscopic matter, therefore is not 
general enough, even from the view point of the classical statistical physics. Recently we gave 
an exact and general proof for the principle of entropy increase by use of general principles of 
quantum theory [5]. In this paper, we would put the principle on an axiomatic basis, consistent 
with classical and quantum mechanics, but not special for them. We collect definitions and 
axioms in section 2, lemmas and theorems in section 3. Among them, theorem 4 is exactly the 
principle of entropy increase. Section 4 is a discussion. 



2 Definitions and axioms 



Definition 1: The evolution is a process not interrupted by measurement. 

Definition 2: A system is a collection of objects, its evolution in time is determined by itself, 

and is not correlated with any other object. 

Definition 3: The state of a system at a given time is a property of the system at that time, 
which determines which observables are certain, what are their value, as well as the change of 
this property itself at that time. 

It means the existence of a state is conditional. If there is a state for the system under 
consideration at a time to, it must have a state at any other time t in the course of evolution 
which is determined by the original state at the time to- The differential equation governing the 
state evolution is of the first order in time, which in turn means that two different states keep 
different during the evolution. These points are true both for classical and quantum mechanics. 
On a set of independent events one may define probability distribution. 

Definition 4: Independence between two states means: if the system stays in one of them the 
observer can definitely not see the property which defines another state. 

Definition 5: A set of states for a system is complete if and only if any other state of the 
system depends on at least one state in it. 

In classical mechanics, two different states are always independent from each other; while in 
quantum mechanics, independence of states means they are orthogonal to each other. Two 
different but nonorthogonal states depend on each other in the sense, that there is a nonzero 
probability Wab to see the property of one state a while the system stays in another state 
b. In classical mechanics Wab = ^ab, while in quantum mechanics the relation is relaxed to 
be Wab = Wba- Both in classical and in quantum mechanics, two independent states keep 
independent from each other through their evolution. Since an orbital state of a particle occupies 
a finite volume in its phase space, quantum state is numerable. Classical state is innumerable 
in its original form. However, it is an effective method in classical statistical mechanics, to let 
the phase volume of a state finite, so that make the state numerable, and let the phase volume 
approaches zero at the final stage of derivation. If classical theory is applicable to the problem, 
this method always gives right answer. 

Definition 6: The state with a certain value of a given observable is called an eigenstate of this 
observable, the corresponding value of the observable is called its eigenvalue. 
In classical mechanics, every state is an eigenstate of all observables. But in quantum mechanics, 
eigenstate of an observable has to be solved from the eigenequation of the observable. However, 
in both cases a set of all independent eigenstates for a given observable is complete. 
Definition 7: If a set of observables may be measured simultaneously with certain outcome, 
and the result is complete enough to determine the state of the measured system, the set is 
called a complete set of observables for the system. 

Definition 8: Measurement of a complete set of observables on a system is called a complete 
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measurement. 

While a complete measurement determines the state of the system, an incomplete measure- 
ment cannot determine the state but may determine a probability distribution of the system 
over a set of different states. In classical mechanics, it means a probability distribution over a 
complete set of independent states. In quantum mechanics one handles this situation by a Her- 
mitian operator, called the density operator. Its eigenstates are independent of each other, and 
the eigenvalues may be regarded as probabilities of finding that the system stays in the corre- 
sponding states respectively. In both cases, an incomplete measurement determines a probability 
distribution of the system over a complete set of independent states. Denoting the nth state 
of the set by |n) and the probability of the system staying in the state |n) by a non-negative 
number Wn, we have the normalization relation W„ = 1) and 
Definition 9: The information of a system is given by 



The summation is over the complete set of independent states, and the information means the 
amount of information for short. 

In classical and quantum physics, it is always possible to divide a macroscopic system into 
subsystems, so that every subsystem is still large enough to be macroscopic but is already 
macroscopically uniform, and the microscopic non-uniformity is still negligible. 
Definition 10: The uniform system is a system, in which a kind of observable (intensive observ- 
able) takes the same value everywhere, and the values of other kinds of observables (extensive 
observables) are proportional to each other. 

Definition 11: The divisible system is either a uniform system or a system which may be 

divided into uniform subsystems. 

For a divisible system one may define the entropy. 

Definition 12: In unit of Boltzmann constant k the entropy is given by 



in which subscript i specifies its subsystems all being uniform. It is the negative sum of the 
information of these subsystems. 

We assume our system satisfies the following axioms: 
Axiom 1: The system is in one of its states at a given time. 

Axiom 2: The probability Wab of finding the property of state a in the state b equals the 
probability W^a of finding the property of state b in the state a. 

Axiom 3: Two eigenstates of the same observable with different eigenvalues are independent 
from each other. 

Axiom 4: The set of all independent eigenstates for a given observable is complete. 
Axiom 5: The set of independent states for a system is numerable. 




(1) 



n 




(2) 
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Axiom 6: The state at a time determines the state at any other time for the same system 
throughout the whole process of evolution. 

Axiom 7: Independent states keep independent from each other during the evolution in time. 
Axiom 8: The state of the system after a complete measurement is the one determined by the 
outcome of the measurement. 

Axiom 9: The system may be described by a probability distribution over a complete set of 

independent states. 

Axiom 10: The system is divisible. 

In classical and quantum statistical physics, these axioms are exactly satisfied. We expect 
they may still be satisfied in future new physics, and also be satisfied in some processes other 
than those in physics. It would make the results derived from them not only exactly true in 
physics but also applicable to a wide range of other problems 

3 Lemmas and theorems 

Now let us remind you some mathematical inequalities. One can find them and their proofs 
elsewhere [3 |6]. Mathematically, we define OlnO = limg^o('? In^) = 0. 
Lemma 1. For any non- negative number x we have 



X In X > X — 1 



(3) 



the equality holds when and only when x = 1. 

Lemma 2. For sets [wi] and [xj] of non-negative numbers with Yli^i — li have 




(4) 



Lemma 3. For sets [Wi] and [Tij] of non-negative numbers with 




and 




(5) 



we have 




for every j. 



(6) 




(7) 



and 




(8) 
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Lemma 4. For positive numbers [Wjj], Wi = Wij and Wj 
have 



J2i Wij, with Y^^■ Wij = 1, we 





(9) 



and 

Wij In W^j >YWilnWi + Y^ W- InW- . (10) 

ij i j 

The equahty holds when and only when Wij = WiW'j for all ij, it is that the Wij may be 
factorized. 

Consider a system. At time we do not know its state, but know the probability distribution 
over a complete set of its independent states. Its information is therefore given by ([T]). We have 
Theorem 1 (information conservation): The information of a system does not change in 
the course of evolution. 

Proof: From the definition 9 we see, the information of a system relates only to the prob- 
ability distribution over a complete set of its independent states, irrelevant to the contents of 
these states. Since, according to the axiom 7, the independence of states does not change in 
the evolution, the probability distribution and therefore the information of the system does not 
change either. It is 

T(t)=T(to), (11) 

in which T{t) and X(to) are information of the system at two different time t and tQ respectively, 
in its course of evolution. The theorem is proven. 

According the axioms 3 and 4, a complete set [L] of observables for the system has a complete 
set of independent eigenstates. Denoting this set by [[m)], and the probability of finding the 
property of state \m) by W^, the information of the complete set [L] of observables for the 
system is defined by 

m 

Denoting the probability of finding the property of state \m) in the state |n) by Wm,n, we have 

W;^ = Y,Wra,nWn, (13) 

n 

m 

and by axiom 2 also 

Y,W„,,n = YW„,,n = l . (15) 

n m 

Since probabilities Wm,n ai'e non-negative, according to lemma 3 and equation ([1]) we obtain 

%] < X , (16) 

and therefore have proven 

Theorem 2: The information of a given complete set of observables for the system is not more 
than the information of the system itself. 

Now, let us divide the system into two parts a and b. Suppose [Lj], with z = a or 6, is 
a complete set of observables of part i, |nj) is their n^th eigenstate, and is a complete 
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set of independent states for part i. Therefore [La, Lb] is a complete set of observables, and 
[InaTib)] = [|na)|n;,)] is a complete set of independent states, both for the system. Denoting the 
probability of finding the property of state \nanb) in the state \n) by Wna,ni,n, the probability of 
finding part a in the state \na) and part b in the state \nb) is 

Wn^n, = Y.^n.n„nWn, (17) 

n 

with normalization 



riant 

According to the theorem 2, the information of observables [La, Lf,] for the system is 

lLa,Lt = ^^^^o 1" ^"««'> ^ ^ • (19) 

The probability of finding part a in the state jn^) and the probability of finding the part b in 
the state [ub) are 

Wna = YWnant and = ^ . (20) 

rib "a 

respectively. In (jl8M20p . it is understood that the summation is over those and only, for 
which Wn^ni, > 0. According to axiom 8, after the measurement of [Li], the state of part i would 
be one in the set The probability for the presence of state |nj) is Wn^. The information 

for part i is 

I^ = Y,WnMWn, ■ (21) 

rii 

From lemma 4 and equations (jl9ll2ip we see 

Ia+Ib<^- (22) 

The equality holds when and only when WnaUb = WnaWm, for all Ua and Ub, it is that the 
probability distribution is factorized. The later means two parts of the system do not correlate 
with each other. We may further subdivide the parts and apply (j22p to them again and again, 
the result is the statement 

J2^i<I, (23) 

i 

in which the summation is over all parts of the system. Therefore we have proven 

Theorem 3: The sum of information of all parts of the system is no more than the information 

of the system itself. 

According to the axiom 10, we may make every part of the system be uniform. In this case, 
by the definition 12 for the entropy and the equation ([23]) . we see 



S>-I (24) 
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for a system. The equality holds when and only when parts of the system are not correlated to 
each other. We then arrive at 

Theorem 4 (principle of entropy increase): The entropy of a system if changes can only 
increase. 

Proof: We would prove the theorem operationally. Suppose we measured the entropy S'(to) 
of the system at the beginning time to- According to the definition 12, it means that we measured 
the entropy of every uniform part of the system and summed them up. This operation had to 
destroy the correlation between these parts, and made the probability distribution of the system 
factorized to a product of probability distributions of its parts. By eq. (p^ and the discussion 
after it, we see 

S{to) = -I{to) , (25) 

in which I (to) is the information of the system at time to after the measurement. After this first 
measurement the system evolves according to its own dynamics with information conservation 
(theorem 1). In this course various parts of the system become correlated because the interaction 
between them. The probability distribution of the system will not keep being factorized to the 
product of probability distributions of its parts. Let us measure the entropy S{t) at the time 
t > to in the course, by eqs. (p^ . (fTTj) and ([25|) . we see 

5(t) > -J(t) = -T(to) = 5(to) , (26) 

in which Z{t) is the information of the system just before the measurement at time t. If the 
evolution of the system is interrupted by measurements at times to < ti < t2 < ... < tn-i < t, 
by the arguments resulting in ([2^) we see S{t) > S'(t„_i > ... > S{t2) > S{ti) > S{to). Anyway, 
we have 

S{t) > S{to) . (27) 

The interaction between different parts of the system makes these parts be correlated, which in 
turn makes the entropy of the system strictly increase. The theorem is therefore proven. 

4 Discussion 

The proof here is quite general. Beside the axioms stated in section 2, nothing is assumed. 
Both classical and quantum statistics satisfy these axioms. Therefore, the principle of entropy 
increase is exactly true in them. The axioms are not too special, and may also be satisfied by a 
wide class of statistical processes. It means, perhaps we can find some statistical science other 
than physics, in which the principle of entropy increase is applicable as well. 

From the proof we learn that the entropy of a system increases only because that, when one 
considers it he always neglects the correlation information between different parts of the system. 
It emphasizes the importance of the correlation information in a complete statistical science. 

This work is supported by the National Nature Science Foundation of China with Grant 
number 10305001. 
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